Construction of an advanced in-car spoken dialogue corpus and its characteristic analysis

نویسندگان

  • Itsuki Kishida
  • Yuki Irie
  • Yukiko Yamaguchi
  • Shigeki Matsubara
  • Nobuo Kawaguchi
  • Yasuyoshi Inagaki
چکیده

This paper describes an advanced spoken language corpus which has been constructed by enhancing an in-car speech database. The corpus has the following characteristic features: (1) Advanced tag: Not only linguistic phenomena tags but also advanced discourse tags such as sentential structures, and utterance intentions, have been provided for the transcribed texts. (2) Large-scale: The sentential structures and the intentions are currently provided for 45,053 phrases and 35,421 utterance units, respectively. (3) Multi-layer: The corpus consists of different levels of spoken language data such as speech signals, transcribed texts, sentential structures, intentional markers and dialogue structures, moreover, they are related with each other. It allows a very wide variety of analysis of spontaneous spoken dialogue to utilize the multi-layered corpus. This paper also reports the result of investigation of the corpus, especially, forcusing on the relations between the syntactic style and the intentional style of spoken utterances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of Back-Channel Utterance Corpus for Responsive Spoken Dialogue System Development

In spoken dialogues, if a spoken dialogue system does not respond at all during user’s utterances, the user might feel uneasy because the user does not know whether or not the system has recognized the utterances. In particular, back-channel utterances, which the system outputs as voices such as“yeah”and“uh huh”in English have important roles for a driver in in-car speech dialogues because the ...

متن کامل

Coherent Back-Channel Feedback Tagging of In-Car Spoken Dialogue Corpus

This paper describes the design of a backchannel feedback corpus and its evaluation, aiming at realizing in-car spoken dialogue systems with high responsiveness. We constructed our corpus by annotating the existing in-car spoken dialogue data with back-channel feedback timing information in an off-line environment. Our corpus can be practically used in developing dialogue systems which can prov...

متن کامل

Construction and Evaluation of a Large In-Car Speech Corpus

In this paper, we discuss the construction of a large in-car spoken dialogue corpus and the result of its analysis. We have developed a system specially built into a Data Collection Vehicle (DCV) which supports the synchronous recording of multichannel audio data from 16 microphones that can be placed in flexible positions, multichannel video data from 3 cameras, and vehicle related data. Multi...

متن کامل

Example-based Speech Intention Understanding and Its Application to In-Car Spoken Dialogue System

This paper proposes a method of speech intention understanding based on dialogue examples. The method uses a spoken dialogue corpus with intention tags to regard the intention of each input utterance as that of the sentence to which it is the most similar in the corpus. The degree of similarity is calculated according to the degree of correspondence in morphemes and dependencies between sentenc...

متن کامل

Stochastic Dependency Parsing of Spontaneous Japanese Spoken Language

This paper describes the characteristic features of dependency structures of Japanese spoken language by investigating a spoken dialogue corpus, and proposes a stochastic approach to dependency parsing. The method can robustly cope with inversion phenomena and bunsetsus which don’t have the head bunsetsu by relaxing the syntactic dependency constraints. The method acquires in advance the probab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003